AITopics

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Neural Information Processing SystemsFeb-10-2026, 11:21:37 GMT

cebd648f9146a6345d604ab093b02c73-Paper.pdf

algorithm, diffusion, langevin dynamic, (16 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
(6 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Thanh Huy Nguyen, Umut Simsekli, Mert Gurbuzbalaban, Gaël RICHARD

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Neural Information Processing SystemsAug-19-2025, 22:59:02 GMT

In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit.

exit time, noise, sgd, (14 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Neural Information Processing SystemsAug-16-2025, 13:31:18 GMT

Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization

Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales.

algorithm, diffusion, langevin dynamic, (15 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
(6 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Dixit, Rishabh, Gurbuzbalaban, Mert, Bajwa, Waheed U.

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

arXiv.org Artificial IntelligenceOct-6-2023

This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the 'flat' geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing analytic techniques do not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions, which explicitly bring out the dependence on problem dimension, conditioning of the saddle neighborhood, and more, for a class of strict-saddle functions.

exit time, saddle point, trajectory, (15 more...)

doi: 10.1093/imaiai/iaac025

2006.01106

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts (0.04)
(5 more...)

Genre:

Research Report (0.63)
Workflow (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (0.45)
Government > Military (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Dixit, Rishabh, Gurbuzbalaban, Mert, Bajwa, Waheed U.

Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima

arXiv.org Artificial IntelligenceJul-13-2023

This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle-points and convergence to local minima through a both asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the local regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minima and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.

artificial intelligence, machine learning, sequence, (19 more...)

2307.0703

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Education (0.47)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.45)

arXiv.org Artificial IntelligenceJun-16-2023

Power-law Dynamic arising from machine learning

Chen, Wei, Du, Weitao, Ma, Zhi-Ming, Meng, Qi

We successfully train deep neural networks (DNN) and achieve big breakthroughs in AI tasks, such as computer vision [7, 8, 14], speech recognition [21, 23, 24] and natural language processing [5, 26, 27], etc. Stochastic gradient descent (SGD) is a mainstream optimization algorithm in deep machine learning. Specifically, in each iteration, SGD randomly sample a mini batch of data and update the model by the stochastic gradient. For large DNN models, the gradient computation over each instance is costly. Thus, compared to gradient descent which updates the model by the gradient over the full batch data, SGD can train DNN much more efficiently.

artificial intelligence, machine learning, stationary distribution, (14 more...)

2306.09624

Country: Asia > China (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Arnaboldi, Luca, Krzakala, Florent, Loureiro, Bruno, Stephan, Ludovic

Escaping mediocrity: how two-layer networks learn hard single-index models with SGD

arXiv.org Artificial IntelligenceMay-29-2023

This study explores the sample complexity for two-layer neural networks to learn a single-index target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d\log{d})$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

artificial intelligence, initialization, machine learning, (17 more...)

2305.18502

Country:

North America > United States (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Artificial IntelligenceSep-26-2022

Controlling mean exit time of stochastic dynamical systems based on quasipotential and machine learning

Li, Yang, Yuan, Shenglan, Xu, Shengyuan

The mean exit time escaping basin of attraction in the presence of white noise is of practical importance in various scientific fields. In this work, we propose a strategy to control mean exit time of general stochastic dynamical systems to achieve a desired value based on the quasipotential concept and machine learning. Specifically, we develop a neural network architecture to compute the global quasipotential function. Then we design a systematic iterated numerical algorithm to calculate the controller for a given mean exit time. Moreover, we identify the most probable path between metastable attractors with help of the effective Hamilton-Jacobi scheme and the trained neural network. Numerical experiments demonstrate that our control strategy is effective and sufficiently accurate.

artificial intelligence, machine learning, quasipotential, (20 more...)

doi: 10.1016/j.cnsns.2023.107425

2209.13098

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-15-2021

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Bedi, Amrit Singh, Parayil, Anjaly, Zhang, Junyu, Wang, Mengdi, Koppel, Alec

Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space. Doing so invalidates smoothness conditions of the score function common to PG. Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced here for the first time. Further, we characterize the dependence of the set of local maxima on the tail index through an exit and transition time analysis of a suitably defined Markov chain, identifying that policies associated with Levy Processes of a heavier tail converge to wider peaks. This phenomenon yields improved stability to perturbations in supervised learning, which we corroborate also manifests in improved performance of policy search, especially when myopic and farsighted incentives are misaligned.

exit time, expression, inequality, (14 more...)

arXiv.org Machine Learning

2106.08414

Country:

North America > United States > Maryland > Prince George's County > Adelphi (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
(2 more...)